Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PythonCall extension to convert xarray objects #882

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

JamesWrigley
Copy link

This is quite handy for interop with Python. There's a couple of things I might need a hand with:

  • The tests create a 2D DataArray with only one dimension having a coordinate. When I run the test code in the REPL, lookup(y, :length) returns NoLookup, which is what I expect. But in the tests it seems to automatically create a lookup of 1:5 and I'm not sure where that's coming from because it's not being created by the pyconvert() methods.
  • I couldn't figure out how to add a new page to the docs with the modifications in the PR. Not sure if I'm building them wrong or if there's something missing.

Also note that the tests create a shared conda environment named @dimensionaldata-tests, I went for a shared environment instead of a local one because this way it gets stored in the depot and will get cached by julia-actions/cache.

@JamesWrigley
Copy link
Author

(I noticed that CI wasn't caching anything anyway so I took the liberty of adding it in d3c1730)

@felixcremer
Copy link
Contributor

There is also https://github.com/meggart/PyYAXArrays.jl which might give some inspiration.

@rafaqz
Copy link
Owner

rafaqz commented Dec 9, 2024

But in the tests it seems to automatically create a lookup of 1:5

Can you show where you mean? NoLookup is always just a Base.OneTo(length) underneath because a Lookup is an AbstractArray and it needs length, so may as well be the indices. That will sometimes display as a 1:5 vector

@rafaqz
Copy link
Owner

rafaqz commented Dec 9, 2024

But this looks really nice to have overall.

Should we also handle/test more complicated lookups like DateTime? And is there information from xarray if a lookup is points or intervals, categorical etc?

@JamesWrigley
Copy link
Author

JamesWrigley commented Dec 9, 2024

Can you show where you mean? NoLookup is always just a Base.OneTo(length) underneath because a Lookup is an AbstractArray and it needs length, so may as well be the indices. That will sometimes display as a 1:5 vector

Ah I'm dumb... the test was checking if the lookup == NoLookup() instead of isa NoLookup.

Should we also handle/test more complicated lookups like DateTime?

Agreed that it'd be useful, but I do not currently have the courage to tackle that hornets nest 😅

And is there information from xarray if a lookup is points or intervals, categorical etc?

No I don't think that kind of information is supported yet according to the docs: https://docs.xarray.dev/en/stable/user-guide/data-structures.html#coordinates
But it may come in the future: pydata/xarray#8463

When reading that page I realized that these conversion methods won't work in two situations:

  • Multiple coordinates/lookups assigned to a single dimension.
  • Multi-dimensional coordinates/lookups that are assigned to multiple dimensions.

Does DimensionalData support those at all? From the docs I can't figure out how to construct a DimArray for either situation.

@rafaqz
Copy link
Owner

rafaqz commented Dec 9, 2024

When reading that page I realized that these conversion methods won't work in two situations:
Multiple coordinates/lookups assigned to a single dimension.
Multi-dimensional coordinates/lookups that are assigned to multiple dimensions.
Does DimensionalData support those at all? From the docs I can't figure out how to construct a DimArray for either situation.

I'm not sure how common the first would be, or if I understand totally, what's the use case?

The second I have just never implemented but isn't too hard in theory, all the machinery is in place with Transformed lookups we "just" need to swap the functions for matrices. (But it's also very rare in the wild)

My guess is merging this without DateTime handling workingwill lead to a bunch of bug reports. But won't pythoncall just handle that for us? Did you try it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants